Overview of Automatic Genre Identification
نویسنده
چکیده
Genre is a category of artistic, musical, or literary composition characterized by a particular style, form, or content (according to Merriam-Webster Online Dictionary [2] – among two other, less relevant definitions). It can be debated whether a genre can be defined by style (and form) alone, or must content (topic) also be taken into account. For example, can science fiction be distinguished from all other literary genres only by style or must one also consider that it talks about spaceships and time travel? But for the purpose of information retrieval, topic component of genre can and should be disregarded, because topic is usually treated separately and is characterized by keywords (and other methods), so genre should be orthogonal to topic. This report is written with a general corpus such as WWW in mind. Genres that cannot easily be distinguished by style are probably too fine-grained anyway for a corpus like that: it would be nice to be able to search for science fiction, but at this granularity, there are simply too many genres. It is probably best to limit oneself to genres such as homepage, FAQ, scientific paper, etc. Another common way to describe the genre of a document is through the purpose of the document, but looking for a way to automatically detect the purpose of a document leads back to style.
منابع مشابه
The Prestigious World University on its Homepage: The Promotional Academic Genre of Overview
In response to the competitive demands for establishing their international academic and financial credentials, the universities globally distribute some online introductory information about themselves. To this end, the university homepages have increasingly turned into the rhetorical space for the development of promotional academic texts in recent years. In this study, we examined university...
متن کاملImplementing a Characterization of Genre for Automatic Genre Identification of Web Pages
In this paper, we propose an implementable characterization of genre suitable for automatic genre identification of web pages. This characterization is implemented as an inferential model based on a modified version of Bayes’ theorem. Such a model can deal with genre hybridism and individualization, two important forces behind genre evolution. Results show that this approach is effective and is...
متن کاملAutomatic Genre Identification: Towards a Flexible Classification Scheme
This paper presents an automatic genre classification model that implements a flexible classification scheme, i.e. a scheme capable of performing zero-, oneor multi-genre assignment. I suggest that this scheme is more appropriate for genres on the web, because many web pages have often more than one genre or none at all. The model that I propose relies on the distinction between the concepts of...
متن کاملNovelty Detection Based on Spectral Similarity of Songs
We are introducing novelty detection, i.e. the automatic identification of new or unknown data not covered by the training data, to the field of music information retrieval. Two methods for novelty detection one based solely on the similarity information and one also utilizing genre label information are evaluated within the context of genre classification based on spectral similarity. Both are...
متن کاملNovelty Detection for Spectral Similarity of Songs
We are introducing novelty detection, i.e. the automatic identification of new or unknown data not covered by the training data, to the field of music information retrieval. Two methods for novelty detection are evaluated within the context of genre classification based on spectral similarity. Both the method based solely on the similarity information and the one also utilizing genre label info...
متن کاملAn n-gram Based Approach to the Classification of Web Pages by Genre
The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. The hypothesis of this research is that an n-gram representation of a Web page can be used effectively to automatically classify that Web page by genre. This research involves the development ...
متن کامل